In this study we explored the mode of genetic inheritance of Varroa destructor, a parasitic mite of honeybees, so far known as a haplodiploid species with a arrhenotokous parthenogenesis reproductive mode. The mite shows contrasting phenomena of highly inbreeding life style (sib-mating), yet maintaining relatively high genetic variation.

The experimental setup consisted of a three-generational pedigree.
Sample size: 30 families, total of 223 individuals.
3 different colonies, from OIST apiary.
For details of mite collection and pedigree construction, please see the original manuscript.
All biosamples are available in Sequence Read Archive (SRA) under the accession PRJNA794941.
All data are available and reproducible from the GitHub page.

load libraries

library("tidyverse")
library("plyr")
library("dplyr")
library("ggplot2")
library("scales")
library("ggpubr")
library("gridExtra")
library("grid")
library("GGally")
library("vcfR") # for extracting genotype data from a vcf file
library("data.table")
library("stringr")
library("janitor")
library("gmodels")
library("rstatix")
library("freqtables")
library("broom")
library("DescTools") # for the Goodness-of-Fit test, 3 variables; and for Breslow-Day Test for Homogeneity of the Odds Ratios
library("vcd") # for the woolf test testign log odds for each pair
library("patchwork") # for gathering the plots
#library("fuzzyjoin") # to join tables based on a string in a column
#library("aspi") # Repeated G–tests of Goodness-of-Fit, work only for 2 variables..
#library("RVAideMemoire") # Repeated G–tests of Goodness-of-Fit, work only for 2 variables...
#library("InfoTrad")
#library("ggthemes") # for more colors in the ggplot
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE,
                     fig.width = 20,
                      fig.asp = 0.6,
                      out.width = "100%")
#fig.width = 6,fig.asp = 0.8,out.width = "100%"

Load Variant Call Format (VCF) file.

Extract genotypes for each site and individual. The metadata for all samples can be found in here.

vcf <- read.vcfR("/Users/nuriteliash/Documents/GitHub/varroa-linkage-map/data/vcf_filter/Q40BIALLDP16HDP40mis.5Chr7/Q40BIALLDP16HDP40mis.5Chr7.recode.vcf", verbose = FALSE )
vcf
## ***** Object of Class vcfR *****
## 223 samples
## 7 CHROMs
## 35,169 variants
## Object size: 293.5 Mb
## 0 percent missing data
## *****        *****         *****
# extract the genotype for each site in each individual
gt <- extract.gt(vcf, element = "GT") 
gt <- as.data.frame(t(gt)) %>%
  rownames_to_column("ID")
#clean the ID names
gt$ID <- sub("_[^_]+$", "", gt$ID)

The individual ID name nomenclature is composed of 3 parts, separated by an underscore (_).
The first part is the family ID, the second is the unique name of the individual, and the the third part indicates its generation, sex, and its position in relation to the foundress mite (generation F0).
For example: Individual ID 240_240a_son, belong to family 240, has a unique name of 240a and is the son of the foundress mite of family 240, that is, its a male of F1 generation.
Individual ID 240_241b_grndat, belong to family 240, has a unique name of 241b and is the granddaughter of the foundress mite of family 240, that is, its a female of F2 generation (and the daughter of 240_240a_son).


From the 223 individuals of 30 families, we excluded non adults mites (for which sex could not be determined for sure).
Eventually, we kept 144 individuals (adult males and females in F0, F1 and F2 generations) of 26 families, and all 35,169 biallelic sites.


For example, these are the first 6 sites (13565-25513) of chromosome (NW_019211454.1), in family number 240.

table <-  gt %>% 
  t() %>%
  as.data.frame() %>%
  row_to_names(row_number = 1)
  #dplyr::select(contains(c("son", "dat", "fnd"))) # keep only adults of F0, F1 and F2 

table %>% select(contains(c("240_"))) %>% head()
##                      240_241b_grndat 240_240a_son 240_241e_grn 240_241d_grn
## NW_019211454.1_13565            <NA>          0/1          0/1         <NA>
## NW_019211454.1_18037            <NA>          0/0          0/0          0/0
## NW_019211454.1_20998            <NA>         <NA>         <NA>          0/0
## NW_019211454.1_24935            <NA>         <NA>         <NA>          0/0
## NW_019211454.1_25470            <NA>         <NA>         <NA>          0/0
## NW_019211454.1_25513            <NA>         <NA>         <NA>          0/0
##                      240_241a_grndat 240_241c_grnson 240_240_fnd 240_241_dat
## NW_019211454.1_13565             0/1            <NA>         0/0         0/1
## NW_019211454.1_18037             0/0            <NA>         0/0         0/0
## NW_019211454.1_20998             0/0            <NA>         0/0         0/0
## NW_019211454.1_24935             0/0             0/0         0/0         0/0
## NW_019211454.1_25470             0/0            <NA>         0/0         0/0
## NW_019211454.1_25513             0/0            <NA>         0/0        <NA>

The family members include:

  • F0 generation: foundress female mite (240_240_fnd).
  • F1 generation: adult son (240_240a_son) and adult daughter (240_241_dat) of the foundress F0 mite. Because in varroa mite reproduction is via sib-mating, these brother and sister will also mate, to produce the F2 generation.
  • F2 generation: adult grandson (240_241c_grnson) and two adult granddaughters (240_241a_grndat and 240_241b_grndat), of the foundress F0 mite.

Each individual, was genotyped for each site with one of the three genotypes:

  • Homozygote for the reference allele = 0/0
  • Heterozygote = 0/1
  • Homozygote for the alternate allele = 1/1
  • “NA” = site genotype not determined

For more information about the mapping, variant calling and variant filtration, please see the Snakemake pipeline in here.


In the following code we viewed the F2 generation genotype of all possible nine crosses between F1 male and females.

  • Control crosses of homozygotic sites: The first four crosses were aimed mainly to detect false sites:
    1. F1 male (0/0) x female (0/0)
    2. F1 male (1/1) x female (1/1)
    3. F1 male (0/0) x female (1/1)
    4. F1 male (1/1) x female (0/0)

Then we crossed heterozygotic sites, to explore the mode of genetic inheritance in varroa mite:

  • Homozygotic male, with heterozygotic female:
    1. F1 male (0/0) x female (0/1)
    2. F1 male (1/1) x female (0/1)
  • Heterozygotic male, with homozygotic female:
    1. F1 male (0/1) x female (0/0)
    2. F1 male (0/1) x female (1/1)
  • Heterozygotic male, with heterozygotic female:
    1. F1 male (0/1) x female (0/1)

In addition to the fixed F1 genotypes, we also “fixed” the F0 female genotype to match that of her son (F1 male), so we can phase the alleles of the F1 generations.
“Phasing alleles” is the process of determine the parental origin of each allele.
for heterozygotic genotype phasing is critical, as it allows the tracking of the allele from one generation tot he next.
For each crossing we have expected genotype-ratio of the F2, based on the following assumptions under arrhenotokous haplodiploid system:

  • Fertilized egg develops into diploid female ♀ (2n).
  • Unfertilized egg develops into haploid male ♂ (n).
  • No paternal inheritance in the males.

read the meta data table and set the family

meta <- read_csv("/Users/nuriteliash/Documents/GitHub/varroa-linkage-map/data/meta_more.csv")

# include all 30 families, and all F2 samples, but indicate if they have an adult sister (may indicate if the F1 female was fertilized)
family = grep("fnd",gt$ID, value=TRUE) %>%
  str_extract("[^_]+")  %>%
  unique()

Control: Homozygotic sites (crosses 1-4)

To detect false sites

(1) F1 male (0/0) x female (0/0)

we expect all F2 offspring to be homozygotic (0/0) like their parents (F1)

p_male_pooled <- ggplot(filter(together, sex=="male"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Males of F1 male (0/0) x female (0/0)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="male"),gt == "0/0")$count))) + 
   #scale_x_discrete("adult_sisters", labels = c("with","without")) +
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))
  
p_nymph_pooled <- ggplot(filter(together, sex=="Nymph"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Nymphs of F1 male (0/0) x female (0/0)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="Nymph"),gt == "0/0")$count))) + 
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))

p_egg_pooled <- ggplot(filter(together, sex=="Egg"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Nymphs of F1 male (0/0) x female (0/0)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="Egg"),gt == "0/0")$count))) + 
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))

do this later

plot ratio for each individual: males: ggplot(filter(together, sex==“male”), aes(fill=gt, y=freq, x=adult_sisters)) + geom_bar(position=“fill”, stat=“identity”, ) + xlab(“F2 siblings”) + ylab(“Proportion of F2 genotype”) + labs(title = “F2 Males of F1 male (0/0) x female (0/0)”) + #geom_text(data = filter(together, gt == “1/1”), aes(y=freq, x=adult_sisters, label=total), vjust = 0) + facet_wrap(~ sample) + labs(fill = “Genotype”) + theme_classic() + theme(axis.text.x = element_text(size=15), axis.title.x = element_blank()) +scale_fill_manual(values=c(“#ffbf00”, “#66b032”,“#1982c4”))

(2) F1 male (1/1) x female (1/1)

we expect all F2 offspring to be homozygotic (1/1) like their parents (F1)

# define a list to put all the data frames in
obs <- list()

for (fam in family) {
 
obs[[fam]] <- table %>%
  dplyr::select(starts_with(fam)) %>%
          dplyr::filter_at(vars(matches("_fnd")), all_vars(. == "1/1")) %>% # force F0 female to be homo, like her son
dplyr::filter_at(vars(matches("_son")), all_vars(. == "1/1")) %>%
  dplyr::filter_at(vars(matches("_dat")), all_vars(. == "1/1")) %>% 
  dplyr::select(contains("grn")) %>%
  tidyr::pivot_longer(everything())  %>% 
  dplyr::rename(sample = name, gt = value) %>%
  dplyr::count(sample, gt, .drop = FALSE) %>% 
  dplyr::filter(gt %in% c("0/0", "1/1", "0/1")) %>%
  mutate(n = as.numeric(n)) %>%
  group_by(sample) %>%
  mutate(total = as.numeric(sum(n))) %>%
  dplyr::rename(count = n)
}

# bind all families together, to a final data frame containing all observed counts
observed <- do.call("rbind", obs) %>% mutate(sample = as.character(sample))
samples <- data.frame(sample = rep(unique(observed$sample), each =3), gt = rep(c("0/0", "0/1", "1/1")))
together <- left_join(samples, observed, by=c("sample","gt")) %>%  
    replace(is.na(.),0) %>% # replace "NA" with zero value when gt doesn't exists 
    group_by(sample) %>%
    mutate(total = as.numeric(sum(count))) %>%
  mutate(freq=count/total) %>% 
  left_join(meta, by="sample")

plots

p_male_pooled <- ggplot(filter(together, sex=="male"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Males of F1 male (1/1) x female (1/1)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="male"),gt == "0/1")$count))) + 
   #scale_x_discrete("adult_sisters", labels = c("with","without")) +
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))
  
p_nymph_pooled <- ggplot(filter(together, sex=="Nymph"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Nymphs of F1 male (1/1) x female (1/1)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="Nymph"),gt == "0/1")$count))) + 
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))

p_egg_pooled <- ggplot(filter(together, sex=="Egg"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Nymphs of F1 male (1/1) x female (1/1)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="Egg"),gt == "0/1")$count))) + 
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))

grid.arrange(p_male_pooled, p_egg_pooled, p_nymph_pooled, ncol = 3)

(3) F1 male (0/0) x female (1/1)

We expect zero sites for this cross, since there is no paternal inheritance to the males.
Indeed, there are only 23 sites in the F2 pooled females, and 14 for the F2 pooled males:

# define a list to put all the data frames in
obs <- list()

for (fam in family) {
 
obs[[fam]] <- table %>%
  dplyr::select(starts_with(fam)) %>%
            dplyr::filter_at(vars(matches("_fnd")), all_vars(. == "0/0")) %>% # force F0 female to be homo, like her son
dplyr::filter_at(vars(matches("_son")), all_vars(. == "0/0")) %>%
  dplyr::filter_at(vars(matches("_dat")), all_vars(. == "1/1")) %>% 
  dplyr::select(contains("grn")) %>%
  tidyr::pivot_longer(everything())  %>% 
  dplyr::rename(sample = name, gt = value) %>%
  dplyr::count(sample, gt, .drop = FALSE) %>% 
  dplyr::filter(gt %in% c("0/0", "1/1", "0/1")) %>%
  mutate(n = as.numeric(n)) %>%
  group_by(sample) %>%
  mutate(total = as.numeric(sum(n))) %>%
  dplyr::rename(count = n)
}

# bind all families together, to a final data frame containing all observed counts
observed <- do.call("rbind", obs) %>% mutate(sample = as.character(sample))
samples <- data.frame(sample = rep(unique(observed$sample), each =3), gt = rep(c("0/0", "0/1", "1/1")))
together <- left_join(samples, observed, by=c("sample","gt")) %>%  
    replace(is.na(.),0) %>% # replace "NA" with zero value when gt doesn't exists 
    group_by(sample) %>%
    mutate(total = as.numeric(sum(count))) %>%
  mutate(freq=count/total) %>% 
  left_join(meta, by="sample")

plots

p_male_pooled <- ggplot(filter(together, sex=="male"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Males of F1 male (0/0) x female (1/1)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="male"),gt == "0/1")$count))) + 
   #scale_x_discrete("adult_sisters", labels = c("with","without")) +
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))
  
p_nymph_pooled <- ggplot(filter(together, sex=="Nymph"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Nymphs of F1 male (0/0) x female (1/1)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="Nymph"),gt == "0/1")$count))) + 
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))

p_egg_pooled <- ggplot(filter(together, sex=="Egg"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Nymphs of F1 male (0/0) x female (1/1)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="Egg"),gt == "0/1")$count))) + 
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))

grid.arrange(p_male_pooled, p_egg_pooled, p_nymph_pooled, ncol = 3)

(4) F1 male (1/1) x female (0/0)

We expect zero sites for this cross, since there is no paternal inheritance to the males.
Indeed, there are only 24 sites in the F2 pooled females, and 15 for the F2 pooled males:

# define a list to put all the data frames in
obs <- list()

for (fam in family) {
 
obs[[fam]] <- table %>%
  dplyr::select(starts_with(fam)) %>%
            dplyr::filter_at(vars(matches("_fnd")), all_vars(. == "1/1")) %>% # force F0 female to be homo, like her son
dplyr::filter_at(vars(matches("_son")), all_vars(. == "1/1")) %>%
  dplyr::filter_at(vars(matches("_dat")), all_vars(. == "0/0")) %>% 
    dplyr::select(contains("grn")) %>%
  tidyr::pivot_longer(everything())  %>% 
  dplyr::rename(sample = name, gt = value) %>%
  dplyr::count(sample, gt, .drop = FALSE) %>% 
  dplyr::filter(gt %in% c("0/0", "1/1", "0/1")) %>%
  mutate(n = as.numeric(n)) %>%
  group_by(sample) %>%
  mutate(total = as.numeric(sum(n))) %>%
  dplyr::rename(count = n)
}

# bind all families together, to a final data frame containing all observed counts
observed <- do.call("rbind", obs) %>% mutate(sample = as.character(sample))
samples <- data.frame(sample = rep(unique(observed$sample), each =3), gt = rep(c("0/0", "0/1", "1/1")))
together <- left_join(samples, observed, by=c("sample","gt")) %>%  
    replace(is.na(.),0) %>% # replace "NA" with zero value when gt doesn't exists 
    group_by(sample) %>%
    mutate(total = as.numeric(sum(count))) %>%
  mutate(freq=count/total) %>% 
  left_join(meta, by="sample")

plots

p_male_pooled <- ggplot(filter(together, sex=="male"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Males of F1 male (1/1) x female (0/0)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="male"),gt == "0/1")$count))) + 
   #scale_x_discrete("adult_sisters", labels = c("with","without")) +
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))
  
p_nymph_pooled <- ggplot(filter(together, sex=="Nymph"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Nymphs of F1 male (1/1) x female (0/0)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="Nymph"),gt == "0/1")$count))) + 
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))

p_egg_pooled <- ggplot(filter(together, sex=="Egg"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Nymphs of F1 male (1/1) x female (0/0)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="Egg"),gt == "0/1")$count))) + 
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))

grid.arrange(p_male_pooled, p_egg_pooled, p_nymph_pooled, ncol = 3)


Crossing homozygotic male, with heterozygotic female

(5) F1 male (0/0) x female (0/1)

F2 offspring

We expect F2 females’ sites to be evenly distributed between 0/1 and 0/0 genotypes, and F2 males to be evenly distributed between 0/0 and 1/1 (as they are hemizygote).

# define a list to put all the data frames in
obs <- list()

for (fam in family) {
 
obs[[fam]] <- table %>%
  dplyr::select(starts_with(fam)) %>%
    dplyr::filter_at(vars(matches("_fnd")), all_vars(. == "0/0")) %>% # force F0 female to be homo, like her son
  dplyr::filter_at(vars(matches("_son")), all_vars(. == "0/0")) %>%
  dplyr::filter_at(vars(matches("_dat")), all_vars(. == "0/1")) %>% 
   dplyr::select(contains("grn")) %>%
  tidyr::pivot_longer(everything())  %>% 
  dplyr::rename(sample = name, gt = value) %>%
  dplyr::count(sample, gt, .drop = FALSE) %>% 
  dplyr::filter(gt %in% c("0/0", "1/1", "0/1")) %>%
  mutate(n = as.numeric(n)) %>%
  group_by(sample) %>%
  mutate(total = as.numeric(sum(n))) %>%
  dplyr::rename(count = n)
}

# bind all families together, to a final data frame containing all observed counts
observed <- do.call("rbind", obs) %>% mutate(sample = as.character(sample))
samples <- data.frame(sample = rep(unique(observed$sample), each =3), gt = rep(c("0/0", "0/1", "1/1")))
together <- left_join(samples, observed, by=c("sample","gt")) %>%  
    replace(is.na(.),0) %>% # replace "NA" with zero value when gt doesn't exists 
    group_by(sample) %>%
    mutate(total = as.numeric(sum(count))) %>%
  mutate(freq=count/total) %>% 
  left_join(meta, by="sample")

plots

p_male_pooled <- ggplot(filter(together, sex=="male"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Males of F1 male (0/0) x female (0/1)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="male"),gt == "0/1")$count))) + 
   #scale_x_discrete("adult_sisters", labels = c("with","without")) +
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))
  
p_nymph_pooled <- ggplot(filter(together, sex=="Nymph"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Nymphs of F1 male (0/0) x female (0/1)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="Nymph"),gt == "0/1")$count))) + 
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))

p_egg_pooled <- ggplot(filter(together, sex=="Egg"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Nymphs of F1 male (0/0) x female (0/1)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="Egg"),gt == "0/1")$count))) + 
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))

grid.arrange(p_male_pooled, p_egg_pooled, p_nymph_pooled, ncol = 3)

ggplot(filter(together, sex=="male"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Males of F1 male (0/0) x female (0/1)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="male"),gt == "0/1")$count))) + 
      geom_text(data = filter(together, gt == "0/1" & sex=="male"), aes(y=freq, x=adult_sisters, label=total), vjust = 0) +
  #scale_x_discrete("adult_sisters", labels = c("with","without")) +
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4")) +
  facet_wrap(~sample)

(6) F1 male (1/1) x female (0/1)

We expect F2 females’ sites to be evenly distributed between 0/1 and 1/1 genotypes, and F2 males to be evenly distributed between 0/0 and 1/1 (as they are hemizygote).

# define a list to put all the data frames in
obs <- list()

for (fam in family) {
 
obs[[fam]] <- table %>%
  dplyr::select(starts_with(fam)) %>%
      dplyr::filter_at(vars(matches("_fnd")), all_vars(. == "1/1")) %>% # force F0 female to be homo, like her son
dplyr::filter_at(vars(matches("_son")), all_vars(. == "1/1")) %>%
  dplyr::filter_at(vars(matches("_dat")), all_vars(. == "0/1")) %>% 
 dplyr::select(contains("grn")) %>%
  tidyr::pivot_longer(everything())  %>% 
  dplyr::rename(sample = name, gt = value) %>%
  dplyr::count(sample, gt, .drop = FALSE) %>% 
  dplyr::filter(gt %in% c("0/0", "1/1", "0/1")) %>%
  mutate(n = as.numeric(n)) %>%
  group_by(sample) %>%
  mutate(total = as.numeric(sum(n))) %>%
  dplyr::rename(count = n)
}

# bind all families together, to a final data frame containing all observed counts
observed <- do.call("rbind", obs) %>% mutate(sample = as.character(sample))
samples <- data.frame(sample = rep(unique(observed$sample), each =3), gt = rep(c("0/0", "0/1", "1/1")))
together <- left_join(samples, observed, by=c("sample","gt")) %>%  
    replace(is.na(.),0) %>% # replace "NA" with zero value when gt doesn't exists 
    group_by(sample) %>%
    mutate(total = as.numeric(sum(count))) %>%
  mutate(freq=count/total) %>% 
  left_join(meta, by="sample")

plots

p_male_pooled <- ggplot(filter(together, sex=="male"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Males of F1 male (1/1) x female (0/1)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="male"),gt == "0/1")$count))) + 
   #scale_x_discrete("adult_sisters", labels = c("with","without")) +
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))
  
p_nymph_pooled <- ggplot(filter(together, sex=="Nymph"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Nymphs of F1 male (1/1) x female (0/1)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="Nymph"),gt == "0/1")$count))) + 
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))

p_egg_pooled <- ggplot(filter(together, sex=="Egg"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Nymphs of F1 male (1/1) x female (0/1)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="Egg"),gt == "0/1")$count))) + 
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))

grid.arrange(p_male_pooled, p_egg_pooled, p_nymph_pooled, ncol = 3)

ggplot(filter(together, sex=="male"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Males of F1 male (1/1) x female (0/1)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="male"),gt == "0/1")$count))) + 
      geom_text(data = filter(together, gt == "0/1" & sex=="male"), aes(y=freq, x=adult_sisters, label=total), vjust = 0) +
  #scale_x_discrete("adult_sisters", labels = c("with","without")) +
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4")) +
  facet_wrap(~sample)

Crossing heterozygotic male, with homozygotic female

The former crosses of heterozygotic females (5 and 6) show that F2 males can be heterozygotic and carry two alleles, like their mother.
But are these sites real? and if they are, can they be transmitted to their daughters?

(7) F1 male (0/1) x female (0/0)

# define a list to put all the data frames in
obs <- list()

for (fam in family) {
 
obs[[fam]] <- table %>%
  dplyr::select(starts_with(fam)) %>%
      dplyr::filter_at(vars(matches("_fnd")), all_vars(. == "0/1")) %>% # force F0 female to be homo, like her son
dplyr::filter_at(vars(matches("_son")), all_vars(. == "0/1")) %>%
  dplyr::filter_at(vars(matches("_dat")), all_vars(. == "0/0")) %>% 
dplyr::select(contains("grn")) %>%
  tidyr::pivot_longer(everything())  %>% 
  dplyr::rename(sample = name, gt = value) %>%
  dplyr::count(sample, gt, .drop = FALSE) %>% 
  dplyr::filter(gt %in% c("0/0", "1/1", "0/1")) %>%
  mutate(n = as.numeric(n)) %>%
  group_by(sample) %>%
  mutate(total = as.numeric(sum(n))) %>%
  dplyr::rename(count = n)
}

# bind all families together, to a final data frame containing all observed counts
observed <- do.call("rbind", obs) %>% mutate(sample = as.character(sample))
samples <- data.frame(sample = rep(unique(observed$sample), each =3), gt = rep(c("0/0", "0/1", "1/1")))
together <- left_join(samples, observed, by=c("sample","gt")) %>%  
    replace(is.na(.),0) %>% # replace "NA" with zero value when gt doesn't exists 
    group_by(sample) %>%
    mutate(total = as.numeric(sum(count))) %>%
  mutate(freq=count/total) %>% 
  left_join(meta, by="sample")

plots

p_male_pooled <- ggplot(filter(together, sex=="male"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Males of F1 male (0/1) x female (0/0)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="male"),gt == "0/1")$count))) + 
   #scale_x_discrete("adult_sisters", labels = c("with","without")) +
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))
  
p_nymph_pooled <- ggplot(filter(together, sex=="Nymph"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Nymphs of F1 male (0/1) x female (0/0)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="Nymph"),gt == "0/1")$count))) + 
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))

p_egg_pooled <- ggplot(filter(together, sex=="Egg"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Nymphs of F1 male (0/1) x female (0/0)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="Egg"),gt == "0/1")$count))) + 
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))

grid.arrange(p_male_pooled, p_egg_pooled, p_nymph_pooled, ncol = 3)

males_with_sisters = together %>%
  filter(sex=="male") %>%
  filter(adult_sisters == "with adult sister") 

ggplot(males_with_sisters, aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Males of F1 male (0/1) x female (0/0)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(males_with_sisters, sex=="male"),gt == "0/1")$count))) + 
      geom_text(data = filter(together, gt == "0/1" & sex=="male"), aes(y=freq, x=adult_sisters, label=total), vjust = 0) +
  #scale_x_discrete("adult_sisters", labels = c("with","without")) +
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4")) +
  facet_wrap(~sample)

#משהו כאן לא הגיוני . המספרים של האתרים לא תואמים .. איך יכול להיות שיש רק 126 אתרים????

(8) F1 male (0/1) x female (1/1)

# define a list to put all the data frames in
obs <- list()

for (fam in family) {
 
obs[[fam]] <- table %>%
  dplyr::select(starts_with(fam)) %>%
       dplyr::filter_at(vars(matches("_fnd")), all_vars(. == "0/1")) %>% # force F0 female to be homo, like her son
 dplyr::filter_at(vars(matches("_son")), all_vars(. == "0/1")) %>%
  dplyr::filter_at(vars(matches("_dat")), all_vars(. == "1/1")) %>% 
dplyr::select(contains("grn")) %>%
  tidyr::pivot_longer(everything())  %>% 
  dplyr::rename(sample = name, gt = value) %>%
  dplyr::count(sample, gt, .drop = FALSE) %>% 
  dplyr::filter(gt %in% c("0/0", "1/1", "0/1")) %>%
  mutate(n = as.numeric(n)) %>%
  group_by(sample) %>%
  mutate(total = as.numeric(sum(n))) %>%
  dplyr::rename(count = n)
}

# bind all families together, to a final data frame containing all observed counts
observed <- do.call("rbind", obs) %>% mutate(sample = as.character(sample))
samples <- data.frame(sample = rep(unique(observed$sample), each =3), gt = rep(c("0/0", "0/1", "1/1")))
together <- left_join(samples, observed, by=c("sample","gt")) %>%  
    replace(is.na(.),0) %>% # replace "NA" with zero value when gt doesn't exists 
    group_by(sample) %>%
    mutate(total = as.numeric(sum(count))) %>%
  mutate(freq=count/total) %>% 
  left_join(meta, by="sample")

plots

p_male_pooled <- ggplot(filter(together, sex=="male"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Males of F1 male (0/1) x female (1/1)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="male"),gt == "0/1")$count))) + 
   #scale_x_discrete("adult_sisters", labels = c("with","without")) +
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))
  
p_nymph_pooled <- ggplot(filter(together, sex=="Nymph"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Nymphs of F1 male (0/1) x female (1/1)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="Nymph"),gt == "0/1")$count))) + 
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))

p_egg_pooled <- ggplot(filter(together, sex=="Egg"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Nymphs of F1 male (0/1) x female (1/1)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="Egg"),gt == "0/1")$count))) + 
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))

grid.arrange(p_male_pooled, p_egg_pooled, p_nymph_pooled, ncol = 3)

males_with_sisters = together %>%
  filter(sex=="male") %>%
  filter(adult_sisters == "with adult sister") 

ggplot(males_with_sisters, aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Males of F1 male (0/1) x female (1.1)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(males_with_sisters, sex=="male"),gt == "0/1")$count))) + 
      geom_text(data = filter(together, gt == "0/1" & sex=="male"), aes(y=freq, x=adult_sisters, label=total), vjust = 0) +
  #scale_x_discrete("adult_sisters", labels = c("with","without")) +
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4")) +
  facet_wrap(~sample)

(9) F1 male (0/1) x female (0/1)

# define a list to put all the data frames in
obs <- list()

for (fam in family) {
 
obs[[fam]] <- table %>%
  dplyr::select(starts_with(fam)) %>%
        dplyr::filter_at(vars(matches("_fnd")), all_vars(. == "0/1")) %>% # force F0 female to be homo, like her son
dplyr::filter_at(vars(matches("_son")), all_vars(. == "0/1")) %>%
  dplyr::filter_at(vars(matches("_dat")), all_vars(. == "0/1")) %>% 
dplyr::select(contains("grn")) %>%
  tidyr::pivot_longer(everything())  %>% 
  dplyr::rename(sample = name, gt = value) %>%
  dplyr::count(sample, gt, .drop = FALSE) %>% 
  dplyr::filter(gt %in% c("0/0", "1/1", "0/1")) %>%
  mutate(n = as.numeric(n)) %>%
  group_by(sample) %>%
  mutate(total = as.numeric(sum(n))) %>%
  dplyr::rename(count = n)
}

# bind all families together, to a final data frame containing all observed counts
observed <- do.call("rbind", obs) %>% mutate(sample = as.character(sample))
samples <- data.frame(sample = rep(unique(observed$sample), each =3), gt = rep(c("0/0", "0/1", "1/1")))
together <- left_join(samples, observed, by=c("sample","gt")) %>%  
    replace(is.na(.),0) %>% # replace "NA" with zero value when gt doesn't exists 
    group_by(sample) %>%
    mutate(total = as.numeric(sum(count))) %>%
  mutate(freq=count/total) %>% 
  left_join(meta, by="sample")

plots

p_male_pooled <- ggplot(filter(together, sex=="male"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Males of F1 male (0/1) x female (0/1)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="male"),gt == "0/1")$count))) + 
   #scale_x_discrete("adult_sisters", labels = c("with","without")) +
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))
  
p_nymph_pooled <- ggplot(filter(together, sex=="Nymph"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Nymphs of F1 male (0/1) x female (0/1)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="Nymph"),gt == "0/1")$count))) + 
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))

p_egg_pooled <- ggplot(filter(together, sex=="Egg"), aes(fill=gt, y=freq, x=adult_sisters)) + 
    geom_bar(position="fill", stat="identity", ) +
    xlab("F2 males") + 
    ylab("Proportion of F2 genotype") +
    labs(title = "F2 Nymphs of F1 male (0/1) x female (0/1)", 
         subtitle = paste0("Pooled sites = ", sum(dplyr::filter(filter(together, sex=="Egg"),gt == "0/1")$count))) + 
labs(fill = "Genotype") + 
  theme_classic() +
  theme(axis.text.x = element_text(size=15),
        axis.title.x = element_blank()) +scale_fill_manual(values=c("#ffbf00", "#66b032","#1982c4"))

grid.arrange(p_male_pooled, p_egg_pooled, p_nymph_pooled, ncol = 3)